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RE-EXECUnON OF EDIT-COMPILE-RUN 
CVCXES FOR CHANGED LINES OF SOURCE 
CODE, WITH STORAGE OF ASSOQATED DATA 

IN BUFFERS 5 

MICROFICHE APPENDIX 

Included as a part of this application under 37 C.F.R. 
§ 1.96(b) is a Microfiche Appendix consisting of 5 mi- 
crofiche and 202 frames. 10 

RELATED CASES 

This application discloses subject matter also dis- 
closed in our copending applications filed concurrently 
herewith, as follows: 15 
Scr. No. 375,397 for "INCREMENTAL COMPILER 
FOR SOURCE-CODE DEVELOPMENT SYS- 
TEM" U.S. Pat. No. 5.182,806. 
Ser. No. 375,383 for 'INCREMENTAL-SCANNING 
COMPILER FOR SOURCE-CODE DEVELOP- 20 
MENT SYSTEM" U.S. Pat. No. 5,170.465. ' 
Ser. No. 375.398 for "INCREMENTAL LINKING 
IN SOURCE-CODE DEVELOPMENT SYS- 
TEM" U.S. Pat. No. 5.193,191; 
Ser. No. 375,401 for "COMPILER WITH INCRE- 25 
MENTAL DEPENDENCY ANALYSIS FOR 
SOURCE-CODE DEVELOPMENT SYSTEM" 
now abandoned. 
Ser. No. 375,402 for "LINE-SKIP COMPILER FOR 
SOURCE-CODE DEVELOPMENT SYSTEM" 30 
U.S. Pat. No. 5.201,050 and 
Ser. No. 375.399 for "VIRTUAL MEMORY MAN- 
AGEMENT FOR SOURCE-CODE DEVELOP- 
MENT SYSTEM" now abandoned. 
All of said applications are assigned to Digital Equip- 35 
ment Corporation, the assignee of this application, 

BACKGROUND OF THE INVENTION 

This invention relates to computer programming, and 
more particularly to computer aided software develop- 40 
ment. 

The purpose of the invention is to provide a program- 
ming environment designed to enhance the speed and 
productivity of software development, particularly a 
method for substantially decreasing the time required 45 
for recompilation and relinking in the cdit-compile-link- 
run cycle of the software development process. When 
code is being written, the elapsed time through the 
edit-compile-link-run cycle after the user makes a small 
change to the application source code is called the turn- 50 
around time. A primary purpose of the invention is to 
minimize this turnaround time. 

The programming "environment" as the term is used 
herein means the set of programs or modules (i.e., code) 
used to implement the edit-compile-link-run cycle for a 55 
developer, who is ordinarily seated at a terminal and 
engaged in the endeavor of writing code. The environ- 
ment which is the subject of this invention will be called 
"the environment", whereas any program being devel- 
oped under the environment will be called "an applica- 60 
tion". The environment is capable of supporting the 
development of any application, including the environ- 
ment itself. The user of the environment is called "the 
developer", while the user of an application is called 
"the end user". 65 

Software development is characterized by a process 
involving the steps of editing the program, compiling 
and linking the program, and running the program. A 



compiler translates a source program that has been 
written in a high-level language such as Pascal or For- 
tran into a machine executable form known as an object 
program. 

The software development process, is further divided 
into stages, with the earlier stages characterized by 
rapid and large scale activity (e.g., editing) in all or most 
of the application source files, and in the later stages 
characterized by less frequent and smaller changes in 
fewer than all of the source files. During the earlier 
stages the objective is removing syntax errors in the 
source code and logic errors in the application. During 
the final stages the objective is improving the efficiency 
of the application and testing the behavior of the appli- 
cation in the form it will be delivered to the end user. 

It is generally desirable that the quality of the object 
code generated by a compiler, as measured in terms of 
efficiency, be as good as possible. A compiler that gen- 
erates very efficient object code is known as an optimiz- 
ing compiler. Optimized object code is characterized by 
maximized efficiency and minimized execution time. 
However, the complex methods and techniques em- 
ployed by optimizing compilers to produce highly effi- 
cient object code necessarily result in relatively long 
compile times. 

The removal of logic errors is relatively independent 
of the efficiency of the implementation of the applica- 
tion; therefore, during the early stages of software de- 
velopment, it is desirable that the environment empha- 
size turnaround time over optimization. In addition, 
during the early stages it is advantageous to insert ap- 
plication-run-time checks for certain kinds of detectable 
faults such as boundary overrun. The concerns with 
efficiency and testing during the final stages require 
optimization, and the lower frequency of changes 
makes the use of traditional software tools effective. 

The edit-compile-link-run cycle is typically repeated 
numerous times during development of a particular 
piece of software. At any stage of this activity the de- 
veloper may be required to correct detected errors (as 
used herein, "error" means "need for a change", since 
the motivation for making a change may be either re- 
pairing a previous oversight or adding new functional- 
ity). Errors may be detected by the compiler, the linker, 
or later by the programmer during test execution. This 
style of interaction in conventional environments results 
in frequent context changes and delays for the devel- 
oper. Context changes occur while the developer sepa- 
rately and sequentially invokes the editor, the compiler, 
the linker, and the application itself Delays occur while 
the developer waits for these separate tools to complete 
their tasks. 

Thus, while long compile times are tolerable in the 
final stages of developing an application, i.e., when 
generating production quality object code, these delays 
are not tolerable in the eariy stages of the process of 
developing, testing, and debugging software where 
long compile and link times will be much more notice- 
able since these are invoked much more often. More- 
over, changes in the application code made during de- 
velopment, are usually localized and small in size with 
respect to the rest of the program. In known software 
development environments, the turnaround time for 
compiling an application module is proportional to the 
size of the module and the turnaround time for linking 
an application is proportional to the size and number of 
modules. In the environment of this invention both 
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compilation and linking turnaround are proportional to 
the size of the changes to the source code made by the 
developer since the last compile/link operation. Many 
applications programs have 100,000 to 1,000,000 or 
more lines of code; the turnaround time (time for the 3 
edit-compile-link-run loop to complete) in developing 
such programs can become an overnight activity, and 
thus presents a major burden. 

Thus, it is desirable to provide a software develop- 
ment environment that would allow fast turnaround in 10 
the edit-compile-link-run cycle. 

Examples of commercially available developmental 
compilers include "Quick CTtm" by Microsoft Cor- 
poration, "LightSpeed C* by Symantek Corporation, 
"Turbo C*' by Borland Corporation, and Saber-C by 15 
the Saber Company. These prior systems are faster than 
traditional batch compilers, and may provide some 
degree of incremental (as distinguished from batch) 
operation; for example, a module may be treated sepa- 
rately if only that module has been changed since the 20 
last edit-compile-link-run cycle. This level of incremen- 
tal operation is known as coarse-grained incremental 
operation. 

SUMMARY OF THE INVENTION ^5 

In accordance with one embodiment of the present 
invention, a software development system or environ- 
ment is provided which operates generally on a fine- 
grain incremental basis, in that increments as small as a 
single line of code which have not been changed since 30 
the last edit-compile-link-run cycle are reused instead of 
being recompiled. An increment in one embodiment is a 
line of code (or one or more lines of code), or, as will 
appear, another suitable size, such as a semantic incre- 
ment; at various places in the cycle the size of the incrc- 33 
ment corresponds to what is appropriate for that level. 
As an example embodiment, a system referred to as a 
rapid computer assisted software engineering and de- 
velopment system (for which the acronym "RCASE** is 
used below), is disclosed. This system provides a pro- 40 
gramming environment and a number of facilities or 
services designed to enhance the speed and productivity 
of software development engineers, in particular by 
substantially decreasing the time required for recompi- 
lation and relinking in the edit-compile-link-run cycle 43 
common to existing traditional software development 
processes. Several different features, as disclosed 
herein, are directed to achieving these goals. The 
RCASE programming environment employs a fine 
grain incremental (e.g., linc-at-a-time) compiler includ- 50 
ing an incremental scanner, an incremental linker, and 
incremental module dependency analysis (make) facil- 
ity, and a virtual memory manager to reduce or prevent 
thrashing; a context saving and switching mechanism, 
and a checkpoint/restart mechanism, are important 53 
features. Furthermore, the RCASE system is designed 
to operate with any callable editor, callable compiler, or 
callable debugger that conforms to various interface 
requirements. A callable object file transformer can be 
included permitting access to applications prepared 60 
outside of the environment. Access to runtime libraries 
is also provided. 

In a programming environment according to the 
invention, the quality of the object code is de-empha- 
sized because the goal of reducing the time between 65 
editing and running the program is paramount. To in- 
crease the speed of the system, the object code gener- 
ated in the RCASE environment is not optimized, re- 



387 

4 

sides in virtual memory, and is used only for testing the 
application. An executable object code file is never 
saved to disk. Therefore, upon completion of the devel- 
opment phase, an optimizing compiler must be used to 
generate production quality object code. Most of the 
presently-available developmental compilers such as 
those mentioned above have as one of their objectives 
the production of saved, non-optimized, object code; 
but as developers make demands upon those writing the 
compilers to improve the object code generated by the 
developmental compilers, the development systems 
become slower. Accordingly, an important distinction 
between the present invention and known software 
development systems (except Sabre-C) is that the 
RCASE environment is directed to assisting the pro- 
grammer in producing quality source code quickly and 
efficiently, as opposed to generating usable object code. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features believed characteristic of the in- 
vention are set forth in the appended claims. The inven- 
tion itself, however, as well as other features and advan- 
tages thereof, will be best understood by reference to a 
detailed description of a specific embodiment which 
follows, when read in conjunction with the accompany:- 
ing drawings, wherein: 

FIG. I is a simplified chart of a classic edit-compile- 
link-run cycle as may be implemented using the envi- 
ronment of the invention; 

FIG. 2 is a simplified diagram of the components of 
the environment according to one emtKxJiment of the 
present invention; 

FIG. 3 is a block diagram of an example of a com- 
puter system which may use the environment of the 
invention; 

- FIG. 3j is a memory map of a virtual memory system 
used according to one feature of the invention; 

FIG. 3b illustrates a source text image from a single 
application module in contiguous pages of the vinual 
memory arrangement of FIG. 3a; 

FIG. 4 is a memory map illustrating the total memory 
demand of the software development system of the 
invention and the relationship between modules and 
various phases of activity in this system; 

FIG. 5 is a diagram of a part of the environment of 
the invention, illustrating the relationship between 
RCASE, the editor, and the source text and associated 
modules; 

FIG. 6 is a diagram illustrating the relationship be- 
tween the editor, the RCASE module, and the compiler 
of the system of the invention, and illustrating the con- 
text in which the incremental compiler resides; 

FIG. 6a is a diagram of the format of a linker table 
prepared by the compiler 11 for each module 12; 

FIG. 6b is a diagram of the format of an incremental 
dependency analysis table prepared by the compiler 11 
for use by the "make" function of the environment of 
one embodiment of the invention; 

FIG. 7 is a diagram of the general structure of the 
front and back ends of an incremental compiler de- 
signed according to one embodiment of the present 
invention; 

FIG. la is a diagram of a symbol table and symbol 
table entry generated by the compiler of FIG. 7; 

FIG. 8 is a diagram of the structure of a token table 
generated in the compiler of FIG. 7; 
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FIG. 9 is a diagram of the source text, lexical incre- 
ment, and semantic increment relationships in operation 
of the compiler of FIG. 7; 

FIG. 9a illustrates the token handle list contents for 
two of the lexical increments illustrated in FIG. 9; 

FIG. 9b illustrates the contents of a semantic incre- 
ment table; and 

FIG. 9c illustrates the organization of a code incre- 
ment table corresponding to increments of FIG. 9. 

DETAILED DESCRIPTION OF SPECIFIC 
EMBODIMENTS 

With reference to FIG. 1, a chart of the edit-compile- 
link-run cycle is illustrated. During software devdop- 
ment, this cycle is typically repeated many times. An 15 
edit function 10 is executed so long as the developer is 
writing or changing the source code. In traditional 
systems the source text is in a file structure, but accord- 
ing to this invention the source text is maintained in 
buffers in memory. When the developer has reached a 20 
point where he wishes to test the code he has written, 
the compiler 11 is invoked. The input to the compiler 11 
is the source code text produced by the editor 10. There 
are typically a number of source code text buffers 12, 
one for each module of the application under develop- 25 
ment; according to one feature of the invention, those 
modules 12 which have not been changed or are not 
dependent upon changed code are not recompiled. If an 
error is found by the compiler 11, the operation is re- 
turned by path 13 to the editor 10 with a notification of 30 
the location and nature of the error; a feature of the 
invention is that upon recognizing the first error the 
compiler reports the error, quits, and returns to the 
editor, rather than completing and returning a list of all 
of the errors found. If no errors occur during the com- 35 
pile phase, then the object code tables 14 (and other 
information as will bc.described, collectively referred to 
as code-data-symbol buffers) produced as an output of 

the compiler 11 are the input to a linker 15. Again, there 

is a code table 14 associated with each source code text 40 Title 
module 12, as well as other data structures as will be 
explained associated with each module. The linker 15 
produces as an output the executable object code image 
16 in memory, although again the operation is returned 
to the editor 10 via path 17 if an error is encountered 45 
when linking is attempted; again, the quit-on-first-error 
principle is used. The executable code image 16 actually 
consists of the code tables 14 plus a link table produced 
by the linker 15 along with information from run-time 
libraries, but in any event the code image (in memory) is 50 
executed as indicated by the run function 18. If logic 
errors or runtime errors are discovered during the run 
phase, the programmer returns to the edit phase. The 
code image 16, "after being run with no error reported, 
would be saved as debugged object code in traditional 55 
systems; however, according to the present invention, 
the desired objective is debugged source code, i.e., the 
source text modules 12. That is, the purpose is to pro- 
vide a tool for aiding the developer in generating source 
code, not object code; therefore, an optimizing com- 60 
pilcr would be later used to generate production-quality 
object code from the debugged source text modules 12. 

As FIG. 1 indicates by the paths 13, 17 and 19, an 
error may be discovered at either compile, link, or run 
time. Discovery of an error requires the developer to 65 
edit the source code, and then the compile-link-run part 
of the cycle is implemented again. In most systems, the 
turnaround time to complete this cycle results in the . 



developer wasting time and losing concentration be- 
cause of dealing with slow and perhaps clumsy tools 
rather than with the problems of the application itself. 

FIG. 2 shows a simplified diagram representing the 
rapid computer aided software engineering or RCASE 
environment 21 of one embodiment of the present in- 
vention. RCASE is a program which accomplishes the 
bulk of its services via sharable links to other large 
programs typical of a software development environ- 
ment, such as editors, compilers, etc. For example, 
RCASE provides a means of communication and coop- 
eration between the editor 10 and the incremental com- 
piler 11 of the present invention. The editor 10 knows 
what source code in the modules 12 has been modified 
during a program editing session, and the compiler 11 
remembers via the code tables 14, and other data struc- 
tures, various expensive-to-compute values from the 
previous compile. When the editor 10 and compiler 11 
agree that an old value is still valid, the old value is 
reused and therefore need not be recomputed. The 
RCASE environment 21, in addition to the editor 10 
and the incremental compiler 11 as will be described, 
can, in a fully populated embodiment, call upon various 
other services such as a stand-alone debugger 22 of the 
type which may be commercially available, perhaps 
various batch compilers 23, run time libraries 24 for 
each language, and an object file transformer 25. In an 
expanded embodiment, the RCASE environment can 
process source code of various languages such as C, 
Fortran, Pascal, etc., as indicated by the multiple blocks 
11, although it is understood that features of the inven- 
tion may be used in compilers dedicated to one lan- 
guage, such as C. 

A listing of the source code in C language for one 
embodiment of the invention is set forth in the accom- 
panying Microfiche Appendix. This source code listing 
of the Appendix includes the following modules: 



^ Purpose 

_ RCASE Sources (including Incremental Linker) 
RCASE_5ymbols (.sdl) 

RCAS«symbols (.sdl) 

RCASE.h 

RCASE_linlu-UbIes.h 
RCASE- VERSION.H 
RCASE^JIocate.h 
RCASE.C 

RCASE—acUvalc.c • 

RCASE_aJlocatc.c 
RCASE_buUd.c 
RC ASE_chcckpoint_rcsian .c 

RCASE_callback5.c 

RCASE— compile.c 

RCASE—compiler—serviccsx 

RCASE— dcmom.c 

RCASE_link.c 
RCASE_link_images.c 

RCASE.]ink_symbo1s.c 



constants and definitions shared 
by RCASE, its editors and 
compilers 

constants and definitions shared ^ 
by RCASE, its editors and 
compilers 

include file for RCASE 
include Tile Tor RLINK table 
manager 

RCASE versioning data for 
RCASE 

include file for RCASE 
Allocator 

main ( ) for RCASE 
interface for activating an editor, 
compilers, obj processor 
RCASE slot allocator 
RCASE builder 
RCASE logic for checkpoint & 
resurt 

cdiior/coinpiler/etc callbacks 
for RCASE 

RCASE logic for performing the 
compilation phase 
RCASE services for callable 
compilers 

integrity checker for RCASE 
slot allocator 
RCASE linker 

RLINK sharable images table 
manager 

RLINK symbol (string cable) 
manager 
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-continued 



Title 



Purpose 



RCASE-Jink_ublc$.c RLINK ttble lunAger 

RCASE.jessk>n^uuger.c RCASE logic for session 
management 

RCASE-Util c utilities for RCASE . 

RCASE.opi options Ale for 

imnsSexe:RCASE.exe, 
DCASE.«e 
XEDITOR Sources 

XED-VERSIOKH 



xed.h 
XED.C 

XED-ACnVATE.C 

XED_ALLOCC 
XED_CME.C 
XED-SERVICES.C 
MMSS:XEDSHR.OPT 



MMS$:XED,OPT 



RCASE venkming <Uu for stub 
editor 

interfaces internal to XED editor 
standalone*activation module for 
stub editor 

RCASE activation logic for stub 
editor 

stcvage allocator for stub editor 
XED Command Interpreter 
XED services for RCASE 
linker options file for building 
callable XED editor (used by 
RCASE) 

linker options Ale for building 
standalone XED editor 
XC Sources (Incremental Compiler XC) 
xch interface between major modules 

xc-Arithops.h codes for pfn expressions 

xc—cfg.h rule names for c^g.txt for X 

xc_chiype.h define classes of characters 

xc_emil.li internal interface between 

emitter modules 

xc_incubles.h include file for XC incremental 

ubie manager 

ic_parse.h internal interface between 

parser modules 
xc_tokens.h public list of token codes 

xc-.vaxops.h give mnemonic names for some 

vax 1 1 opcodes 

xc_version.h RCASE venioning data for 

prcto compiler 

xc.c XC is sundalone interface to the 

callable XC compiler. It uses the 
callable compiler 
XCSHR.OBJ etc. 
xc^mit_dump,c debug emitters 

xc_activate.c RCASE activation logic for 

XC compiler 

xccontext.c context switching logic for XC 

xcscan.c scanner interface to RCASE 

xc_scanline.c scan line 

xc^program.c recursive parser/generator for 

X programs 

xc^tmt.c parser for X statements 

xc_decl.c parser for X nomenclature 

xcexpr.c recursive parser/generator for 

X expressions 

xc_symbol.c skeleUl symbol table for 

XC compiler 

xc_gen.c generator implcmenUtion of X 

xc_emitvax.c skelcul emitter for vax (int, 

bool, and branches) 
xcexprvax.c expression code generator for 

vax (minimum optimixation 

venion) 

xclinkvax.c link ubIe builder 

xc^ncrvax.c incremental compilation logic for 

XC emitters 

xcincubles.c XC incremental Ublc manager 

for token, semantic and lexical 
increments 

xc_joumal.c XC journal manager for Emitter 

and Symbol Table Actions 
xc_rtl.c runtime library for XC 

xc.opt describe standalone XC 

xcshr.opt describe shareable XC compiler 

build 

xcnl.opt link commands for XC runtime 

library 



One of the important features of the present invention is 
a memory management scheme designed to insure that 
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for a given phase or sub-phase of the cdit-compile-link- 
run cycle of FIG. 1, data or text necessary for the cur- 
rent operation (as well as the code to execute this phase) 
remains in real memory (as opposed to being paged to 
5 disk by the virtual memory system) and is immediately 
accessible. Reduction of page faults and hard disk ac* 
cesses results in much faster turnaround in the edit-com- 
pi]e4ink-run cycle. This is accomplished in part by 
maintaining each source module in its own contiguous 
*0 memory space. While a programmer is editing a pro- 
gram, it is likely that multiple source files 12 will be 
opened but only one is actually being edited at any 
moment. These source text files 12 reside in an in-mem- 
ory structure called a buffer or module. 

Referring to FIG. 3, a computer system which may 
be used to execute the development system of the inven- 
tion is illustrated. The system includes a CPU 30 cou- 
pled to a main memory 31 and to a disk storage facility 
32 by a system bus 33. In one embodiment^ the system 
uses the VAX® computer architecture and the 
VAX®A^MS operating system^ both commercially 
available from Digital Equipment Corporation; how- 
ever, other computer systems employing paging func- 
tions and using other operating systems such as UNIX 
having virtual memory management are useful as well, 
and, as noted below, if the operating system does not 
include paging then the same effect can be added to the 
environment. The main memory 31 usually consists of 
perhaps several megabytes of RAM, depending upon 
the size of the system chosen, and is volatile since dy- 
namic RAMs are usually used. The disk storage 32, on 
the other hand, has a size of perhaps many hundreds of 
megabytes, and is non-volatile. The access time for the 
32 main memory 31 is in the order of 100-nanoseconds, 
whereas the disk storage has an access time measured in 
tens of milliseconds. Of course, the cost per megabyte of 
storage is much less for the disk storage 32 than for the 
main memory 31. The CPU 30 includes an instruction 
4Q unit 34 which fetches and decodes instructions from the 
memory 31, and includes an execution unit 35 for carry- 
ing out the operation commanded by the instructions 
and generating addresses for operands; in addition, a 
memory management unit 36 is included, and this unit 
45 references page tables (and usually contains a transla- 
tion buffer) for translating virtual memory addresses 
generated in the instruction unit 34 or execution unit 35 
into addresses in real memory 31, As depicted in FIG. 
3fl, a memory map of a virtual memory system, as imple- 
50 mcntcd with the typical VAXA'MS or UNIX operat- 
ing systems, includes a virtual memory space 37 having, 
for a CPU architecture with 32-bit addresses, 4- 
Gigabytes of addressable locations (indeed, the virtual 
memory space usually exceeds the size of the disk stor- 
55 age 32), whereas the real memory 31 may be, for exam- 
ple, 2-Mbytes. The virtual memory space is divided into 
pages 38, with each page being, for example, 512-bytes 
in size. Thus, for this example, there are eight million 
page locations in virtual memory 37, but the real mem- 
60 ory 31 can at any one time contain only about 4000 
pages, and a significant part of this will be occupied by 
the operating system. In a typical virtual memory man- 
agement unit 36, the page tables and translation buffer 
contain an entry for each page, with each entry includ- 
65 ing the high-order address bits of the pages currently 
resident in real memory 31. When a memory reference 
is made by an address generated in the execution unit 35 
or instruction unit 34, and the memory management 
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unit 36 does not find a match in the translation buffer, or 
the page tables indicate the page is not in real memory, 
then a page fault is signalled. Implementing a page fault 
consists of jumping to a routine in microcode or in the 
operating system which executes a page swap, i.e., 5 
writes one of the pages in real memory 31 to the disk 32 
and reads the page from disk 32 containing the address 
in question into real memory 31, after which the flow 
returns to the instruction which generated the page 
fault. Page swap operations introduce a considerable 10 
delay due to the disk accesses that are required, and so 
arc to be minimized, particularly where a system such as 
the environment of the invention, making large de- 
mands upon memory, is in use. 

To this end, referring to FIG. 3b, for each source 15 
module 12 a block 39 of virtual memory 37 is allocated 
to hold that module in contiguous pages 38 of virtual 
memory, and no text or information from any other 
module (e.g., other ones of the modules 12, or any other 
data such as tables 14) is allowed to infiltrate any page 20 
38 within the memory space allocated for this module 
12. This same constraint is imposed for each one of the 
other modules 12, code tables 14, and all of the other 
tables, buffers and the like which will be referred to 
herein. This scheme prevents multiple page faults and 25 
disk accesses while operating on a single module. This 
memory management scheme is used not only for 
source text modules 12, but also for the various tables 
and data structures generated during the compile and 
link phases of activity, i.e., when the compiler 11 or 30 
linker 15 is executing. Allocation of memory is accom- 
plished in the preferred embodiment of the present 
invention using the REALLOC function from the stan- 
dard C programming language. That is, while a source 
text module 12 is being edited, its size may exceed the 35 
space previously allocated for it, and so a disjoint page 
would be started; however, before this happens, a 
REALLOC function is performed which allocates a 
contiguous block of memory to re-establish the data 
structure of FIG. Sb. 40 

Thus, as memory demand for a particular data struc- 
ture or module increases beyond the bounds of the 
originally allocated block 39 of memory, a new block of 
contiguous memory must be allocated. If possible, the 
original block 39 will simply be extended, but this usu- 45 
ally results in overlap with the successive block, 
thereby requiring allocation of a new block in a differ- 
ent location. If a new location is used, the data must be 
moved to the new location, an action automatically 
accomplished by the REALLOC function. This is a 50 
portable .means of accomplishing an effect otherwise 
implemented by operating system dependent mecha- 
nisms such as "zoned memory". In any particular imple- 
mentation of RCASE the less portable but more effi- 
cient system-specific zoned memory management facih- 55 
ties may be substituted for the portable version. Note 
also the implication this method has with respect to 
pointers. If a table containing pointers is reallocated, the 
pointers will be wrong. Therefore, to achieve the high 
speed results of the present system, pointers cannot be 60 
used in any table that may get moved during a realloca- 
tion process. 

To use the environment as described herein with an 
operating system which does not have automatic pag- 
ing, the advantages of the invention can be achieved by 65 
adding file reads and file writes in the appropriate 
places, so that during each phase the appropriate data 
structures are in real memory. 
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During the software development process, there are 
many source modules 12 and several possible phases of 
activity (e.g., editor 10, compiler 11, liiiker 15). Thus, 
the total memory demand is demonstrated by a two-di- 
mensional matrix 40 (i.e., a memory map) as shown in 
FIG. 4, where rows 41, 42, 43, 44 and 45 represent 
phases and each column 46 represents a virtual memory 
block 39 corresponding to a source text module 12, or a 
data structure such as a table or list generated by one of 
the other parts of the environment (such as cleanlines 
tables, token tables, etc., as will be described). Note that 
there are a number of columns 46 for each type of data 
structure, corresponding to the number of modules 12 
of source text in the application being worked on. For 
the edit phase 41 (when editor 10 is being executed), 
real memory is needed only for the source text of one 
module 12 at a time; all other data can be paged out 
The blocks 39 present in real memory are represented 
by diagonal shading. For the compile phase 43 (when 
compiler 11 is executing), real memory is needed only 
for the changed part of the source text module 12 and 
the saved information internal to the compiler (as will 
be discussed, this includes the token lists, symbol tables, 
etc.) associated with the one module 12 being compiled 
at any given moment; all other data can be paged out. 
For the link phase 44 (when linker 15 is executing), 
memory is needed only for the link tables and compiled 
code. During the run phase 45, only the code increment 
tables arc used. So, in each phase, only a small part of 
the information associated with each module 12 is em- 
ployed and thus must be in real memory; the rest of the 
information for a given one of the modules can be paged 
out. Therefore, regardless of the total memory demand, 
the instantaneous memory requirements, as defined by a 
given phase/module relationship of FIG. 4, are satisfied 
by having only that which is absolutely necessary in 
real memory. This reduces accesses to virtual memory 
(page swapping to disk) thereby increasing the execu- 
tion speed of the current phase. When a new phase is 
started, there is virtual memory activity required to 
page out the information no longer needed in favor of 
the information for the new phase as it is requested. 
Since this is accomplished in a regular manner, it is 
relatively efficient. 

As a rule of thumb, the memory requirement in using 
the environment of the invention, expressed in number 
of bytes of vinual memory needed, is about five times 
the number of bytes of text in the source modules 12. 
Each line of code contains, on average, about forty 
bytes, so an application containing ]00.0(X) lines of 
source code (40x 100,000 or 4-Mbytes) would require 
five times 4-Mbytes or 20-Mbytcs as a total memory 
requirement. While some computer systems might 
allow use of this much real memory, it is nevertheless 
expensive and rules out use on lower level systems. 
Further, the memory demands in developing an appli- 
cation containing, for example, 1,0(X),000 lines of source 
code become prodigious— i(X)-Mbytes. The advantages 
of using a virtual memory system with page swapping 
minimized during a given phase become apparent. 

The information maintained in memory by the editor 
10 with respect to each open source file includes a 
source text image module I2a and a source text descrip- 
tor table 12^* as illustrated in FIG. 5. The descriptor 
table 12b contains information about the lines of text in 
the source text module 12j including record identifiers 
47, recprd lengths, and a special bit 48 associated with 
each line or record called the modify bit which is used 
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to indicate whether a particular line of source text has 
been modified. This bit is set to a logical 1 by the editor 
10 if iu associated line is edited. RCASE can set the 
modify bit back to 0 after inspecting it. The source text 
module 12a and the source text descriptor table 12^ S 
reside in separate disjoint memory spaces (different 
blocks 39 in virtual memory 37). This allows the record 
descriptor information to be inspected (paged into real 
memory 31) without the need to bring the source text 
module 12a itself into real memory. This is advanta- 10 
geous since the descriptor table 12^ is typically much 
smaller than the source module 12a itself. 

In the compilation mode, the compiler 11 saves infor- 
mation gathered during the compilation of a module 12, 
with this information also allocated so as to be separate 15 
(m a block 39 as in FIG. 36) from other information 
saved by this compiler for other modules 12, or any 
other attached compiler, and also from any information 
saved by the editor 10. 

It should be noted that the above feature derives from 20 
the use of callable compilers as opposed to batch com- 
pilers. That is, normally batch compilers simply compile 
the source code and leave the results of the compilation 
in secondary memory 32 (e.g. a hard disk). In the case of 
callable compilers, the compilers arc dynamically 25 
linked into RCASE and a compiler 11 keeps informa- 
tion in virtual memory 37 between compilations, includ- 
ing saved context for each application module 12 which 
speeds up subsequent compilations of that module. 

The memory management system for incremental 30 
compilation thereby allocates contiguous space as seen 
in FIG. 36 for each separate data structure required in 
compilation, thereby insuring that, except perhaps for 
the first and last pages, the pages each contain nothing 
except the data selected by the rules described above. 35 

In summary, therefore, the function of the memory 
management system of the present invention is in the 
requirement that all code and data required for an edit- 
ing, compilation, or linking operation be resident in 
virtual memory, and the structure of data permit the 40 
selection of a small part of the whole data structure 
appropriate to each phase of the edit-compile-ftnk-run 
cycle, so that the instantaneous demand on real memory 
is minimized. This is accomplished in a portable imple- 
mentation by using the C language standard function 45 
REALLOC to extend each separate data structure 
when necessary so that each separate data structure is in 
a contiguous address space in virtual memory, and 
therefore (except at each end) guaranteed to exclude 
information from any other data structure. 50 

In a batch compiler, the compiler reads each line of 
source text for a module anew at each compilation, 
records its results in the file system, and exits. In accor- 
dance with the present invention, new functionality and 
data structures are provided which allow the compiler 55 
to reuse previously gathered information (e.g., com- 
piled code, link lists, etc.) at recompilation time if the 
source text has not been changed, thereby substantially 
reducing the time required for recompilation after edit- 
ing of the source text. The saved information is orga- 60 
nized as a journal of activity across an internal interface 
between the modules of the compiler. The basis of this 
aspect of the invention is that if the input has not 
changed, and certain other validity checks are passed, 
the contents of the journal will be the same as what 65 
would pass across the interface if the computation were 
repeated, and therefore the journal can be played out to 
revalidate the saved data instead, skipping the costly 
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computation that led to the production of the journal in 
the first place. 

Any interface within a compiler can be joumalled. 
The current implementation journals the interface to 
the scanner, the emitters and the symbol table. Each 
different computer language, and each different way of 
organizing a compiler, provides the engineer the infor* 
mation needed to decide which interfaces to journal for 
most effective savings in turnaround time. Once the 
interfaces are chosen, and the journals are designed, it is 
often possible to optimize the journals themselves by 
combining several separate journal entries into a more 
efficient larger combined entry. Examples of such com- 
bination are the collection of a sequence of 1-byte emit- 
ter calls into a single n-byte emitter call which in this 
implementation is effected by revalidating the previ- 
ously emitted code already in the table 73 where the 
emitter 72 left it (sec FIG, 7), and the combination of 
many separate checks of many attributes of a symbol in 
the symbol table into a single check of all attributes at 
once. 

The location of the journals can variously be in the 
static state of the component of the environment that 
produced them or in a general jou mailing mechanism 
provided by RCASE itself. Tht choice is an engineer- 
ing trade off, taking into consideration other uses to 
which the incremental compiler might be put (for exam- 
ple, batch use), which implementing team has the exper- 
tise to implement the various algorithms, and the over- 
all performance and structural integrity of the compo- 
nents of the environment. Both styles of implementation 
are used in RCASE. 

Each time a journal is replayed for effect, the corre- 
sponding application source code must be skipped so as 
to reposition the compiler to either reuse or rebuild the 
next journal. The amount of source code consumed in 
building a journal is always an integral number of appli- 
cation source lines and therefore the amount to skip, 
recorded along with the journal, is also an integral num- 
ber of source lines. The number of lines is called the 
increment size. 

Referring to FIG. 6, RCASE manages this activity 
with an internal data structure called the cleanlines 
table 50. There is one cleanlines table 50 for each appli- 
cation source module 12. RCASE switches between 
tables 50 when context is changed. A cleanlines table 50 
has one entry 51 for every source line in the associated 
application module 12. Each entry 51 of the cleanlines 
Uble has the following information: (1) a field 52 to 
locate the application source line description and text, 
(2) a field 53 of how many source lines from the current 
point toward the end of the application source buffer 12 
are unmodified since the last use of their information 
(clean), and (3) a field 54 of information provided by the 
compiler 11 which is, in fact, the compiIer*s means of 
locating all saved information associated with this line. 
The cleanlines tabic 50 can be built in a variety of w^ys. 
The current implementation makes a pass over the in- 
formation records in the source buffer 12 (or its descrip- 
tor table 12^), examining the clean bits 4S, and building 
the cleanline entries 51 while at the same time deleting 
invalid entries and inserting newly required ones. 

It is also feasible to keep the cleanlines table 50 up-to- 
date via callbacks from the editor 10, notifying RCASE 
whenever a line of application source code is changed. 
This second solution leads to faster turnaround but 
makes more demands on the editor 10 and is more com- 
plex to implement. 
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In the RCASE environment, referring to FIG. 6, the 
compiler 11 retains the scanner function and, at the first 
compilation, the compiler 11 reads the source text 12 in 
the conventional fashion, except that it comes through 
RCASE 21 from the editor memory image 12 instead of 5 
from a file. In addition, however, the compiler 11 also 
constructs a token table which contains, for each lexical 
unit of information in the source text, the corresponding 
collected and computed information with each such 
lexical unit of information being identified and indexed 10 
by a corresponding token (see FIG. 8). The scanner 
journal is built for the fewest lines possible. Typically 
that is one line, although there are programming lan- 
guage features that require more than one line to be 
recorded in a single lexical increment (See FIG. 9 for an * ^ 
example). The compiler 11 passes to RCASE 21, for 
each line of source text, the corresponding sequence of 
tokens and RCASE saves each line in the form of the 
tokens (see FIG. 9a). 

To describe this operation in a slightly different man- 
ner: 

(1) The source text (from buffer 12) comes from the 
editor 10 through RCASE 21 to the compiler 11 a 
line at a lime, passed as a pointer; 

(2) The compiler 11 scans the source text, tabulates 
the tokens, and passes the locations of the tokens 
back to RCASE 21; 

(3) RCASE hands the token locations back to the 
compiler 11 when a token is needed. 

This is an example of using RCASE 21 itself as the 
joumalling agent. The alternative is to keep the journal 
within the scanner and have the scanner check to see if 
the application source text is unchanged so that the 
token journal can be reused. 35 

A second set of journals managed by RCASE records 
emitted code. For the more modem computer lan- 
guages the structural nesting of the language can be 
reflected into a nested set of saved code fragments. The 
goto statement violates this structure and must be 4Q 
treated with special records of information. For older 
languages such as Fortran, the goto is so pervasive that 
nested fragments are unfeasible and the goto therefore 
does not need special records. The advantage of nested 
fragments is that it is more efficient to reuse a containing 45 
fragment and all its internal fragments as a single jour- 
nal item than it is to reuse the larger number of non- 
nested fragments. The types of fragments in the C pro- 
graming language are expression statements (non- 
nested), conditional statements (if and switch), loops 50 
(for, while and until), blocks, and function definitions. 

Each time the compiler 11 discovers the beginning of 
a saveable structure, it sees if there is a reusable frag- 
ment by various checks including the presence of a 
valid save field in the RCASE clcanlines table 50 (see 55 
FIG. 6). If it can be reused the journal is played out, and 
RCASE is instructed to skip ahead the appropriate 
number of application source lines. Otherwise a new 
journal is built and its location is recorded in the clcan- 
lines table 50. During a rebuild of a fragment (called a 60 
semantic increment), the compiler 11 will use the jour- 
nalled scanner information. When the actual text has 
been modified, the scanner journal will also have to be 
rebuilt before it can be played out. 

To summarize these operations: 65 

(1) The start of a programming construct which is 
permitted to form a semantic increment is encoun- 
tered. This is discovered either by examining the 
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leading tokens of the construct or by having previ- 
ously examined them and recorded the result, 

(2) The possibility of replaying the journal is checked: 
the source text must be unmodified, there must be 
a valid saved field for this construct in RCASE, the 
fragment must still be consistent with information 
in the compilers tables. 

(3) If all checks pass, the journal is played out and the 
application source text is skipped. If some check 
fails, the journal is discarded, the tokens recom- 
piled to build a new journal, and the new journal is 
played out The location of the new journal is 
passed back to RCASE for future reference. 

(4) In some circumstances (generally bad application 
source text formatting) a journal that aligns with 
application text record boundaries cannot be biult. 
In this case no journal is recorded, at the cost of 
losing the possibility for reuse later. 

Steps 1, 2, 3 and 4 of the above has at least two forms: 
run effects and symbol effects. The former is repre- 
sented by hard-compiled code with perhaps some 
pending fix-ups while the latter is represented by a 
series of symbol table actions. Most statements 
have some elements of both types. The minimum 
unit of saving is the line, although multiple contigu- 
ous lines can be bundled into a logical line for state 
saving. 

A third set of journals managed by RCASE records 
symbol names and attributes. The interface is to the 
symbol table 56 (seen in FIG. 7 and la). The joumalled 
actions are scope entry and exit, symbol lookup and 
enter, and get and set for any attribute. Typically the 
symbol enter and attribute setting is done in response to 
a declarative construct in the application language. 
Typically the symbol lookup and attribute getting is 
done in response to an executable construct in the appli- 
cation language. Atypical] y there are situations which 
can cause any of the actions in association with any of 
the application language constructs. 

The symbol table 56 of the interactive compiler 11 to 
be completely incremental would have two unique 
attributes beyond the structure required for batch com- 
pilers: 1) no information can be deleted from the symbol 
table at scope exit and 2) every piece of information 
must be accompanied by a validity bit. In the alterna- 
tive, instead of every piece of information having a 
validity bit, a single validity bit can be used for the 
entire table entry. When a new compilation of an appli- 
cation source module 12 is begun, the fully developed 
symbol tabic 56 from a previous compilation is present, 
with all validity bits set to false. Each action that would 
enter information in the table 56 falls into one of three 
categories: I) the information is found and marked in- 
valid—the response is to mark it valid, 2) the informa- 
tion is found and marked valid— the response is to issue 
an error diagnostic and terminate the compilation, 3) 
the information is not found— the response is to enter it, 
so long as it is consistent with all information currently 
marked valid. 

Each action that would take information from the 
symbol table 56 is joumalled as a check to see that the 
expected answer is the found answer. The purpose of 
these journals is to check that the assumptions under 
which an executable journal was built (the previous 
section of this explanation) are still vahd. 

There are three situations: 1) the information is found 
and marked valid — the response is to permit reuse of the 
executable journal, 2) the information is found and 
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marked invalid — the response is to invalidate the exe- 
cutable journal and rebuild it» 3) the information Is not 
found— the response is likewise to invalidate the execut* 
able journal. 

In the process of rebuilding an executable journal the 3 
same three situations can occur. In case 1) the rebuild- 
ing continues. In cases 2) and 3) the rebuilding ceases, 
an error diagnostic is issued and control is returned to 
the developer to correct the situation. 

Upon closing a scope, either played out of a journal 10 
or during a rebuild of a block, all of the local infonna- 
tion in the symbol Uble 56 pertaining to the closed 
block is left in the table but marked invalid and removed 
from the lookup path. The consequence is that by the 
end of compilation for an application source module 12, 13 
its entire symbol table 56 is removed from the lookup 
path and all information is marked invalid, which is the 
correct starting situation for later reuse. 

Other journals are needed for some languages. Incre* 
mental preprocessing for macro expansion is one exam* 20 
pie. 

The saved information, mostly in journals, is attached 
to a context associated with a single application source 
module 12. Each module 12 has its own set of journals. 
The compiler 11 is instructed to set its context to the 25 
proper information prior to any use of that information. 
During compilation the information is checked and 
usually changed to some degree. During linking the 
information is accessed to fmd and set addresses that 
cross application module boundaries. The RCASE 30 
command to the compiler is SetContext; it can be issued 
at any time that RCASE is in control. 

The saved information is the substance of the check- 
point files written upon command from the developer. 
The restart entry of RCASE is a special way to enter 35 
the environment so that all state is restored to RCASE 
and its attached components, restoring the situation so 
that the develo]>er can continue where work was left off 
prior to the previous checkpoint operation. Not all 
useful information need be saved at checkpoint; but 40 
rather only that information that cannot be recreated, or 
is too expensive to recreate upon environment restart. 
This involves engineering tradeoffs that are dependent 
on the details of the application language, the structure 
of the incremental compiler and the various perfor- 45 
mance factors of the host system and its backing store. 

In addition to compiling the individual source text 
files into executable code, the compiler must also link 
the executable code files, In RCASE, this linking is 



much as compiling for any gains to be realized. Existing 
incremental linkers operate under two constraints 
which limit the available improvement: I) the result of 
hnking is placed in a file for later activation, costing 
both the time to format the information and the file 
write time; furthermore the file must be designed so that 
it can be incrementally changed; and 2) when an appli- 
cation source module b changed, an entirely new object 
module is produced by the compiler, thus all of its sup- 
plied information must be distributed throughout the 
compiled application as well as the information from 
the rest of the application being supplied back to the 
changed module. 

An RCASE incremental compiler avoids changing 
either the supplied address or addresses of places need- 
ing external values. This action is, in fact, largely a 
byproduct of the fine-grain incremental routines cm- 
ployed to speed up recompilation. Therefore when a 
recompilation takes place, the number of places needing 
correction may be none at all, or perhaps only a few. 
The best case for conventional incremental compilers 
(whole module changes) is the worst case for RCASE. 
The incremental compiler 11 must supply additional 
information to the incremental linker 15 to activate this 
saving (See FIG. 6c). Each entry 57 in the local link 
table 58 for an application module 12 has, in addition to 
the traditional NEED/DEF fields, a field 59 giving the 
incremental status of the information — "new", "old", or 
^'deleted". The incremental linker 15 updates both ends 
of "new" information and removes "deleted" entries 57 
from its own tables. 

There is, as in the case of maintaining the cleanlines 
table 50, an alternative method of keeping the tables 58 
of the incremental linker 15 correct. The incremental 
linker 15 can supply entries through which it can be 
notified of changes by the compiler 11 so that they can 
be dealt with one at a time during compilation. 

As described above, the editing of source text will 
require the recompilation of at least the changed por- 
tions of the text. It is frequently the case, however, that 
a change in one module 12 of source text will be re- 
flected in other modules 12 of the source text that have 
not, in themselves, been changed by the editor and these 
dependent modules of source text must also be recom- 
piled and relinked. In the prior art, this has usually been 
handled by either assuming that all of the text must be 
recompiled or by examining a developer-prepared de- 
pendency file (often called a makefile), and then con- 
sulting the file system time-of-last-write data for each 



unique in that it is done in such a manner as to retain the 50 file to insure that each dependent module has a later 



module boundaries between executable code files that 
originate in the original modular structure of the source 
text. 

This linking is done in RCASE in memory through 
the use of pointers to point to the "absolute", that is, 53 
actual memory, addresses of the code to be linked 
rather than to file-related addresses. RCASE uses inter- 
nal tables to identify the "ends" of each link, that is, to 
identify the points at which addresses are to be inserted 
into the compiled code and where in the compiled code 60 
each address is to point to. 

It is common experience in system programming that 
hnk time for several modules is approximately the same 
as compilation time for one module. The consequence 
for batch compilers is that even if either compiling or 65 
linking alone were instantaneous, the other would limit 
any gains in turnaround to 50% speedup. The conse- 
quence for RCASE is that linking must be speeded up as 



time-of-last-write than any module upon which it de- 
pends. Modules that fail this test are then recompiled in 
order of least dependent first. The prior art has three 
deficiencies that are corrected by the features of 
RCASE; 1) the developer-prepared makefile must be 
correct for the dependency analysis to be reliable— as 
changes arc made during development it is common for 
the makefile to become incorrect and for the developer 
to fail to correct it, 2) the developer must plan for the 
worst case in preparing the makefile, often causing 
unnecessary recompilation, 3) the smallest unit for 
which the developer can express dependencies is a com- 
plete application module where in fact changes are 
nearly always to a small part of such a module. 

In contrast, according to an incremental dependency 
analysis feature, RCASE generates and stores fine grain 
dependency graphs 60 as seen in FIG. 6b identifying in 
field 61 dependencies between symbols within the appli- 
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cation modules 12, and in field 62 the times of last 
change to each given application source module 12. 
RCASE may therefore, at any time, from graphs 60, 
identify all changed sections of source text and those 
portions of source text that are dependent from a given 3 
section.of source text that has been changed. The auto- 
matic generation of dependency information from the 
application source modules 12 relieves the developer of 
the need to express and maintain the dependency infor- 
mation in the makefile, and in addition increases reliabil- 1^ 
ity by eliminating a source of developer-introduced 
error This feature in turn substantially decreases the 
time required for recompilation by allowing the identifi- 
cation of only those portions of text that must be recom- 
piled because of a direct change or a dependency from 
a changed section of code without complete recompila- 
tion or computation of the dependencies. It is rarely 
necessary to recompute the dependency graph 60 unless 
the changes are of such magnitude as to substantially 
modify the organization of the modules 12 of the source 
text, which is a relatively rare event. 

It is a common problem in software development 
that, due to the magnitude and complexity of the infor- 
mation involved, the shutting down and restarting of 
work on a project, for example at the end of the work- 
day or in the event of a system failure, is normally quite 
time consuming. In addition, if certain processes are not 
completed when shutdown occurs, certain information 
and a certain amount of work may be lost. This disad- 
vantage arises in the usual software development envi- 
ronment because the relevant information is saved and 
restored in the form of standard files, much in the same 
manner, for example, as a document is saved when an 
editing session is ended. Information is lost if the file has 35 
not been recently saved, or if it is the nature of the 
environment to keep the file in a temporarily unrecov- 
erable state during the execution of the environment. 

In RCASE, as previously described, the entire pro- 
cess, including all data being worked upon, such as 4^ 
source text and compiled code, the editors, compilers 
and so forth in use, and the process information are 
always resident in virtual memory. Taking advantage of 
this, RCASE includes a **suspend'* command which 
saves the relevant states from each module in memory 45 
to a file and then reports the file name to RCASE upon 
termination of a session. RCASE saves the list of file- 
names in a single file. Operation may then be resumed 
by a corresponding "restore" command, with the entire 
state being returned from the single file. It has been 50 
found that this approach results in very substantial sav- 
ings in the time required to shut down and restart work 
because the intermediate slate of RCASE can be saved 
in addition to the normal file products. It is this interme- 
diate state that is so expensive to restore. In addition, 55 
the user sees exactly the same state upon restart as at 
suspend, thereby saving the time and energy normally 
required for reorientation after restart. 

Referring to FIG. 7, a diagram of the incremental 
compiler 11 is illustrated. The front end structure of the 60 
compiler includes a scanner 65 which receives the 
source text (via RCASE from the editor) and the clean- 
line increments, and generates token tables 66 and lexi- 
cal increment tables 67, as will be described. A parser 70 
receives the tokens from the scanner and passes filtered 65 
tokens and rules to the code generator 71, which, for 
executables, via emitter 72 produces incrcmenls of ob- 
ject code maintained in code increment tables 73, while, . 
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for declaratives, symbol tables 56 are produced and 
maintained via symbol table manager 74, 

The following is an architectural description of the 
structure for an incremental compiler 11 to be sup- 
ported by RCASE, according to one embodiment. An 
incremental compiler supporting RCASE will have the 
following characteristics: 

For shared modules between batch and incremental 
compiler: 

(a) Provide for reusable front end modules (scanner 
65, parser 70, and possibly the generator 71) that 
can be shared by both the batch and incremental 
versions of the compiler. This enforces some de- 
gree of source language consistency between batch 
and incremental environments. 

(b) Be structured so that its reusable modules can be, 
and are, tested standalone. 

(c) Support a callable interface to an incremental 
version of the compiler. 

The front end internals must: 

(a) Support the ability to generate and reuse lexical 
increments. 

(b) Support skip-scanning, the ability to stop the scan 
at standard points and restart it as though it had 
scanned the intervening source lines. 

(c) Use a token interface between its scanner 65 and 
parser 70. 

(d) Specify a standard context-free grammar for its 
language. 

(e) Support skip-parsing, the ability to stop the parse 
at standard points (e.g., begin-of-statement) and 
restart it as ^ough it had parsed the intervening 
tokens. 

(0 Use a "token -f rule" interface between its parser 
70 and generator 71. 

(g) Support the ability to generate and reuse semantic 
increments. 

(h) Support an incremental symbol table 56. 
Joumalling: 

(a) Joumalling technology is used to check the valid- 
ity and perhaps reuse previously generated infor- 
mation (such as token tables 66, generated code of 
tables 73, symbol tables 56, etc.) 
Context switching and checkpointing should be pro- 
vided to support: 

(a) State-saving and context switching of joumalled 
information. 

(b) Checkpointing of joumalled information to a file 
for reuse upon restart. 

Memory management must be provided to: 
(a) Avoid thrashing by allocating memory for jour- 
nalled information on a per-sourcc buffer 12 basis. 
An overview diagram depicting the context m which 
an incremental compiler 11, according to the present 
invention, resides is shown in FIG. 6. In this diagram, 
the editor 10 manages multiple source buffers 12 and 
provides source text to RCASE 21. RCASE interacts 
with the editor 10 to identify the source lines in a given 
buffer that have been modified since the last time they 
were processed by the compiler 11. RCASE presents an 
abstraction of the source buffer 12 called cleanline in- 
crements to the compiler 11. Cleanline increments 
allow the compiler 11 to determine if it can reuse saved 
information it had generated in a previous compile-ses- 
sion of the same source buffer 12. If not, the compiler 11 
can obtain the text for the appropriate source lines 
through RCASE 21 to generate the necessary informa- 
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tion which can then be reused in the next compilc-scs- 
sion. 

The output of the compiler 11 is a code-data-symbol 
buffer 14 that contains all the necessary information 
required by the incremental linker 15. For each source 5 
buffer 12 that is incrementally compiled there is a corre- 
sponding code-data-symbol buffer 14 for it. The code- 
data-symbol buffer 14 is in fact organized as a collection 
of independently allocated and managed areas in virtual 
memory 31 and is retained only in virtual memory until 10 
such time as the developer requests a checkpoint. 

RCASE is designed to support multiple languages so 
at any given moment there may be more than one com- 
piler 11 active in the environment. Every time RCASE 
receives a request to compile a source buffer 12, it must IS 
determine if the appropriate compiler has been acti- 
vated. This requires that incremental compilers 11 sup- 
port the RCASE callable interface so that RCASE can 
make them available when necessary. 

FIG. 7 contains the general structure of the front end 20 
of RCASE incremental compiler 11, The source lines 
and cicanline increments come from RCASE. The scan- 
ner and preprocessor 65 together produce a sequence of 
tokens which refer back to the token tables 66 via access 
functions for detailed information. The parser 70 dis- 25 
covers the sequence of grammar rule applications to 
construct the canonical parse. The generator 71 pro- 
duces the intermediate language (IL) and sends it to the 
back end structure. The lexical increment table 67 conr 
tains joumalled information for saved-state in the scan- 30 
ncr 65. When a lexical increment is reused, skip-scan- 
ning occurs. 

It is desirable for a batch and incremental compiler 11 
to share the code for the scanner/preprocessor 65 and 
parser 70, and maybe the generator 71. It is unlikely that 35 
any other modules can be shared for the fmest grain 
incremental compilers. 

FIG. 7 also contains the structure of the incremental 
back-end. The incremental back end receives the inter- 
mediate language (IL) from the front end. The IL is 40 
delivered either to the symbol table manager 74 or to 
the checking-emitter 72. The symbol table manager 74 
constructs and updates the incremental symbol tables 
56. The checking-emitter 72 accesses the symbol table 
56 when necessary, and generates unoptimized check- 45 
ing code which is managed in the code increment tables 
73 and semantic increment tables 76. 

The checking-emitter 72 trades off target-code qual- 
ity for rapid turnaround. The emitted code is generated 
in increments which arc independent of the code for 50 
surrounding increments, at the cost of preventing cross- 
statement optimizations, and at the gain of enabling 
incremental update. The checking-emitter 72 also adds 
checking code for bounding memory access, detecting 
aliasing and uninitialized variables, and so on. 55 

In FIG. 6, the final product of the compiler is de- 
scribed as a code-data-symbol buffer 14. Logically, a 
single code-data-symbol buffer 14 is the composition of 
the corresponding symbol table 56 and code increment 
table 73. There is a procedural interface that enforces 60 
this logical view which is used by the RCASE incre- 
mental linker 15. 

As shown in FIG. 7, the main job of the scanner 65 is 
to convert its input of cleanline increments and source 
text into a sequence of tokens for the parser 70. The 65 
scanner 65 is able to perform this task so that it only 
needs to actually scan the line if the line has* just been 
created or modified. This capability is implemented 



using a data structure called a lexical increment. The 
scanner 65 also supports the ability, called skip-scan- 
ning, to skip-ahead in the input source-line stream. In 
addition, the scanner 65 hides the physical layout of the 
representation of tokens from the rest of the compiler 
11, by communicating through a token interface. All of 
these concepts are described below. 

A cleanline increment is an abstraction presented to 
the scanner 65 by RCASE. It can answer the question 
"How many unmodified source lines follow the current 
line?'*. This is useful information since RCASE incre- 
mental compilation is based on joumalling effects of 
multiple consecutive lines of text (lexical and semantic 
increments). When the answer is "The needed text has 
been modified/' then RCASE provides access to the 
raw text in the editor 10. 

Typically a fme of source text can be scanned by 
itself. The compiler-significant content of a fine is a 
sequence of tokens. In unusual cases, such as Fortran 
"CONTINUE" or C end-of-linc override "\", several 
lines must be scanned as a unit. The lexical unit of one 
or more lines is called a lexical increment. The use of a 
lexical increment is to record and later hand off succes- 
sive tokens to the parser 70. A lexical increment pro- 
vides the following capabilities: 
Check( ): performs various validity checks such as 
whether or not the current set of consecutive 
source lines associated with this lexical increment 
have been modified (or in some other circum- 
stances having to do with context-dependence of 
token structure which must be checked outside of 
the scanner mechanisms). It uses cleanline incre- 
ment information to perform some of its checks. It 
returns TRUE if the validity checks succeed. 
Scanlncrement( ): creates a new lexical increment. It 
consumes as few complete lines as possible, starting 
at the first line it is given, to build a valid lexical 
increment. Its value is the number of lines actually 
consumed (rather than the number inspected). By 
checking after application of Scanlncrcment( ): it is 
possible to find out if building this lexical increment 
invalidated the next one (by invading its territory 
or leaving a gap). 
FirstToken( ): sets the lexical increment's current 
context to the location of the first token in its token 
list. 

NextToken( ): is an iterator that updates the lexical 
increment's current context to the location of the 
next token in its token fist. 

Token( ): returns as its value the '^handle" for the 
current token (handles are defined when the token 
interface is described). A null-handle value is rc- 

* turned when the end of the list is reached. 

As indicated in FIG. 7, lexical increments are orga- 
nized in tables 67. Every source buffer 12 that is com- 
piled has a lexical increment table 67 associated with it. 
A lexical increment table 67 also has various capabilities 
defined to add, delete, iterate, and update its entries. 
FIG. 9 illustrates the relationship between source lines 
and lexical increments. 

The motivation behind the design of the lexical incre- 
ment is that Scanlncr€ment( ) need only be applied 
when the corresponding text has been changed. The 
saved scanned tokens of the lexical increment is the 
basis for incremental scanning with its corresponding 
speedup of compilation, both because of the saved CPU 
time and also because of the potential saved page faults 
in not accessing the text. 
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Skip-scanning is the ability of the scanner 65 to slop portions of the token-stream that have been modified 

the scan at standard points and restart it further down in since the previous compile-session. This capability is 

the source text as though it had scanned the intervening known as skip-parsing. 

source lines. This action must be applied when a lexical It is feasible to construct a context-free grammar for 
increment from table 67 is reused so that the scanning 5 a programming language. As it happens such context- 
context can be updated to the appropriate position in free grammars describe all legal programs and some 
the source text that follows the last token of the reused illegal programs. These illegal programs must be de- 
Icxical increment This action must also be applied tected by other mechanisms in the compiler. It can be 
when semantic increments from tables 76 are reused required, in addition, that the grammar is LALR(1X and 
since its associated lexical - increments are indirectly 10 that it reflects the semantics of the language as defined 
reused. As long as the scanning context is maintained, by the batch and incremental compilers. Any grammar 
the scanner 65 will know where to start scanning when successfully used as input to a LALR-based compiler 
a new lexical increment needs to be generated. meets this requirement 

The token interface is used by the scanner 65 to hide p^r use in the RCASE environment it is required that 
the physical layout ofthe storage oftokens from the rest 15 ^ context-free grammar be constructed for each pro- 
of the compiler 11. This strategy allows the scanner 65 gramming language. This constraint does not require 
to hide the details of how it supports state-saving and fr^^t ^^^^^ ^ laLR grammar processor-the 
check-pomting of tokens. It also encourag^ the con- canonical parse can be generated with recursive descent 
struction of a reusable scanner-module (batch vs. mere- techniaues as well 

mental) because of the resulting simplified interface 20 ^he following sections describe skip-parsing and use 

the scanner and the r^t of the compder 11. ^j,^ token-nile interface between the parser and the 

The mam technique apphed in the token mterface is cenerator 

the use of a **handle" to identify a token. A handle is, in ^ cb:^™«.;^„ « ♦v* ok;iw« *^ «o«.-r Tn 

^, . , ^ • J • / r * * Swp-parsing is the ability to stop the parser 70 at 

the sunplest case, an mdex mto an array of structures ^*»^a^^a ^r^i^Z « 

^ 1 ^, r *L • r i_ standard points (e.g. begin-of-statement) and restart it as 

private to the owner of the mformation, each array 25 ^, l -ri. j j *t. • . • . t -r^- • 

element describing a unit of information. Access to the *°'?eh « had parsed the mtervemng tokens This is 

information is provided by a procedural interface pro- ^P^^^^'f differently depending on the type of 

vided by the owner. More complex encodings of han- P''"" (recursive descent or LALR). A recursive 

dies are sometimes chosen for engineering reasons, prin- P""^ ^'"P^ " designated mcrement by not calling the 

cipally for efficient implementation ofthe access func 30 corresponding parsing function (see the discussion of 

^jQj^g semantic increments below for details on designated 

So in FIG. 7, all occurrences of tokens being ex- increments). A table driven parser skips indirectly by 

changed by the scanner 65 and the rest ofthe compiler P/ovidmg a service which updates the mtcmal state of 

11 are really the handles for tokens. When information P^^^er to carry it from before-to-after for the desig- 

on a token is required, the other parts ofthe compiler 11 35 "^ted mcrement Skip-parsing is invoked from the gen- 

use the handle and the access functions of the token ^^^^^^ ^1 in both cases. 

interface When skip-parsing occurs, semantic increments from 

The scanner 65 deals with the transformation of raw ^^^^^ 76 are re-used in place of computing the semantic 

text into tokens. The first step is to discover the lex- actions for the skipped tokens and rules. The reuse of 

ernes. The lexeme must then be associated with the 40 semantic mcrements (made possible by skip-parsing) 

corresponding structure from the language. For some contributes significantly to the speed-up of the compile- 

languages this is a trivial mapping; for others with mac- turnaround. Note, skip-parsing implies skip-scan- 

ros or keyword features, it may require a computation ^^^B 

and other tables. token-rule interface enforces loose coupling be- 
FIG. 8 shows the structure of a token table 66. A 45 ^ween the parser 70 and generator 71. Minimizing the 
lexeme is the unit of information in source text. The dependencies between these two components facilitates 
only distinguishing feature of a lexeme is the text com- the implementation of skip-parsing, and the construc- 
posing it The number of unique lexemes is less than the o*" reusable parser and generator modules. In addi- 
total number of lexemes in the source text. This prop- tion, the token-rule interface hides the detail on the type 
erty is exploited in compilers designed to function 50 of parser 70 being used from the generator 71. 
within the RCASE environment by tabulating unique A large proportion of token and rule sequences can 
lexemes, permitting comparisons on the Uble indices be filtered out for efficiency reasons prior to semantic 
(i.e. handles) rather than on the text characters. The analysis because it has no semantic content. Specifically, 
tabulating object is a lexical table 77. The strings are the tokens with content, meaning identifiers, constants, 
packed away (assume null-terminated ASCII format for 55 and strings, are passed on by the parser 70 to the gcnera- 
the moment) in a non-collectible heap referred to as the tor 71 with, or just prior to, passing the rule that in- 
string table 78. The lexemes are indices into the heap or eludes them so that the semantic routines get the con- 
string table 78. The lexical tables 77 are allocated one to tent they need. Similarly, only the rules with content 
a source text buffer 12 to avoid thrashing. arc passed from the parser 70. In the generator 71 one 
The job of the parser 70, as shown in FIG. 7, is to 60 typically finds a (very large) switch, one entry for each 
report a sequence of grammar rule applications— the content rule. The generator 71 passes a sequence of 
so-called canonical parse. The scanner 65 is simulta- semantic actions on to the back end. 
ncously reponing a sequence oftokens to the parser 70. From the recursive descent viewpoint, it means that 
It is the complete sequence of tokens that drives the the recognizer behaves much like its LALR counter- 
parser 70. These two sequences contain everything that 65 part and does nothing except emit content tokens and 
must be known by the rest of the compiler 11. rules. From the generator viewpoint, it means it does 
An incremental parser 70 can perform this job and not matter whether a recursive or LALR parser is being 
provide the additional capability of only re-parsing used, or both, or even a mixture. 
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The job of the generator 71, as shown in FIG. 7, is to latter is partly an artifact of programming language 
pass on the IL, which among other things contains the design (something new always starts with distinctive 
sequence of semantic actions (emitter and symbol table syntax) and partly a concession to implementation con- 
actions), to the back end. The generator 71 is imple- venience, since it is necessary to associate saved state 
roented as a large switch. In each case the appropriate 5 with unique application source text lines and therefore 
call is made to the back end. When certain conditions inconvenient to have to save two lines of information in 
arise, the generator 71 will invoke skip-parsing to reuse association with one line (see FIG. 9). 
a semantic increment and thus save time in the back end The example in FIG. 9, shows a section of a source 
generating code. buffer 12 containing a C program. Note, even though 

Regarding the requirements for the back end of an 10 the first assignment statement 
incremental compiler 11. the back end niodulcs are not 

to be shared between the batch and incremental ver- x!»<*4->)+(x-;') 
sions of the compiler. Batch compiler wntei^ should not 

be concerned with the specifics of the incremental back spans multiple lines, each line is separately scannable so 
end. The job of the back end is to produce the input for 15 a lexical increment is created for each of its lines (L2 & 
the incremental linker 15. It performs most of this job L3); a semantic increment SI is formed by this state- 
by producing and reusing semantic increments of ublc mcnl. However, the "ELSE** clause uses C*s override 
76 and manipulating an incremental symbol table 56. character **\" to span multiple lines which results in 
The semantic content of part of a source module 12 is each of its lines not being separately scannable. In this 
recorded in a semantic increment (in table 76). The 20 case, a single lexical increment b created to contain all 
boundaries of the semantic increment conespond to the lines required to scan this particular construct (L4). 
semantic units in the source language (for example, a FIG, 9a illustrates the token handle list contents for the 
semantic increment may be an assignment statement). lexical increments L3 and L4 (the tokens are entered 
Because the semantic units of modem programming into the token table 66, the lexical increments into table 
languages nest, so also do semantic increments. A se- 25 67). 

mantic increment provides the following capabilities: The designated increments supported by semantic 

Check( ), insures that the prccomputed state of the increments in FIG. 9 are assignment (SI and S2) and 

semantic increment is consistent with the state of its if-elsc constructs (S3). FIG. 9b illustrates the contents of 

surroundings. Check( ) never has any side effects. a semantic increment. 

Compile( ), computes the effect of this increment, 30 A semantic increment is cither a set of effects on 

both in terms of symbol table access and the pro- emitted code or the symbol ubie 56; the symbol table 56 

duction of executable code. This is called when is described below. The executable effect of a semantic 

Check( ) returns a failure status. The task of com- increment (such as SI, S2 or S3) is the code that is 

pile is to record the effect in the semantic incre- generated for it. Referring to FIG. 9b, an entry 85 in the 

ment tables if possible, and otherwise leave it to be 35 semantic increments table 76 includes a identification 86 

executed without the incremental capability. In this of the semantic increment, the validity checks 87 that 

last case Compilc( ) reports a failure to build an need to be run and a field 88 to identify the code incre- 

increment so the following function, Apply( ) will ment that can be reused if the checks are passed. In the 

be able to return immediately to the compiler case of S2 the validity checks are: the line containing 

proper without attempting an application. 40 the source code has not been modified, and the attri- 

Apply( ), insures that the effect of compiling the butes of variable x are unchanged. In this case the effect 

increment is applied. Apply( ) is called after Com- is to reuse the saved code designated by handle 101. It 

pilc( ) when Check( ) fails. is important to emphasize that the semantic increments 
The RCASE environment requires separation of can be nested to arbitrary depth; the consequence is that 

code implementing parsing and semantic actions. The 43 longer increments of the saved state can be reused for 
means is restriction of the parser 70 to the production of the same constant overhead that each smaller increment 

the canonical parse and associated lexical information. would require— this is a major determinant of the speed 

This requirement enables skip-parsing when recompila- of the system of the invention, and one that is unique, 

tion is not necessary. For any particular programming That is, the contents of the semantic increment in the 

language there will be a set of '^designated increments*' 50 semantic increment table 76 are checks for validating it 

supported by RCASE. For example, the set might in- and a handle into a code increment table 73. A code 

elude assignment statement, declaration, conditional increment table 73 contains fragments of executable 

statement, and function definition. RCASE requires code. Each is contiguous, perhaps in need of some fuup 

that the incremental compiler 11 produce disjoint exc- addresses, but otherwise ready for the CPU. 

cutabic code for disjoint designated increments. This 55 When a semantic increment is reused, the code gener- 

disjointness property enables incremental semantics. ation phase for it is skipped by reusing the code frag- 

Nested increments do not need to be disjoint, of course, ment from table 73. Much (or all) of an execution image 

but they do need to start on a application source line in RCASE is carried out within the code increment 

that is unique to the increment. That is, even nested tables 73. 

increments cannot start on the same line. 60 FIG. 9c illustrates the organization of a code incre- 

The relation between the various levels of abstraction ment table 73. A code increment table 73 is actually 

is (perhaps deceptively) simple. The raw text is main- composed of two sub-tables. The code index table 79 

tained by an editor 10, it is chunked into scannable contains information for managing the code fragment 

objects (lexical increments of tables 67) by the scanner table 80. A semantic increment SI, S2, etc., contains a 

65 of the compiler. Any contiguous sequence of lexical 65 handle 81 to access an entry 82 in the code index table 

increments may fall within the scope of a semantic in- 79. From a code index entry 82, a code fragment entry 

crement. Semantic increments always properly nest, 83 can be derived. A code fragment entry 83 contains an 

and further, never share a first lexical increment. This executable code sequence. 
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Ail code emitted by incremental compilers 11 is self- 
checking for array and pointer bounding. 

A symbol is a name used in the source program to- 
gether with all the information necessary to interpret its 
use in the program. The traditional mechanisms for 5 
associating the occurrence of a name with a particular 
symbol is a symbol table or decorated abstract syntax 
tree. Because of contextual interpretation (such as scope 
rules in C, or position of definition in Fortran) each 
symbol name must be interpreted within context. Thus, 10 
information about each symbol may include the sym- 
boVs name, its data type, usage (e.g., procedure, vari- 
able, or label), its address, etc. 

A conventional symbol table is.accessed via methods 
which enter, check and retrieve information. An incrc- 15 
mental symbol table 56 must behave conventionally but 
also save information for reuse across scope closure, 
and even end-of-compilation. An implementation con- 
sistent with RCASE is to provide a warm/cold (valid- 
ity) bit on each individual item of information (or attri- 20 
bute), the symbol as a whole, the local scope as a whole, 
and the symbol table as a whole. 

For use in RCASE the symbol table 56 must include 
more specific information than typically found in batch 
compilers. In particular, absolute storage addresses are 25 
allocated to global variables. 

Referring to FIG. la, the fields for an entry in an 
incremental symbol table 56 are illustrated in accor- 
dance with one embodiment of the present invention. 
The first field contains an index to the token table where 30 
the symbol name has been stored. The block signature 
field is used to define a context frame. The context 
frame serves the purpose of limiting the scope of a name 
so that the same name can be used in a different context 
for a different purpose without the reference to the 35 
name becoming ambiguous. Consequently, different 
procedures can define and use the same name in differ- 
ent ways. The validity bits, attributes, and address fields 
have been discussed previously. 

When a new compilation of an application source 40 
module is begun, the fully developed symbol table 56 
from a previous compilation is present, (valid informa- 
tion is never discarded from the symbol table 56), with 
all validity bits set to false. The cold items can then be 
warmed up (i.e., the validity bit is set) and reused during 45 
recompilation. For example, as the compile progresses, 
symbol names and associated information are gener- 
ated. But before this information is entered into the 
symbol table 56, the table is checked (via a journal 
action) to determine if that information already exists 50 
within the table from a previous compile. If a panicular 
name already exists in the symbol table, its validity bit is 
set to warm and inquiries are made to determine if its 
attributes are the same as they were after the previous 
compile. If an attribute is unchanged, its validity bit is 55 
set to warm. This process continues until an attribute is 
found that has changed or until a new entry is found. 
Once this happens, everything that depends on the 
changed or new information must be recompiled. How- 
ever, up to this point, the compiler can skip over the 60 
clean or unchanged increment without recompiling 
code unnecessarily. The clean increment docs not need 
to be parsed because the information that would result 
from the parse was saved during the previous compile 
and can now be re-used. To further increase speed and 65 
efficiency, as mentioned before, memory for a symbol 
table 56 is allocated contiguously for a single source 
module 12 to avoid thrashing. 
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Joumalling is an important feature. The underlying 
technique of incremental compiling as herein discussed 
is selective joumalling of interactions across the inter- 
faces between modules of the compiler 11 of FIG. 7. 
Whenever the compiler 11 can record its response to a 
chunk of input as a sequence of actions, there is poten- 
tial to play the actions out rather than recompute them. 
This is especially attractive when the former is many 
times faster to perform than the latter. The simplest 
example is the result of scanning a line of text. The 
journal is the sequence tokens that is associated with 
the corresponding source text. The token list is saved 
within the scanner 65 in lexical increments in table 67 
(also seen in FIG. 9fl) and tokens arc returned, one at a 
time, on demand, to the parser 70. The joumalling is 
implemented on two levels. Token handles are saved in 
the lexical increment's token fist. Token details are 
saved in the token tables 66. Information on a given 
token can be accessed from the token table 66 using the 
token handle. 

Another example of a joumal is the semantic incre- 
ment table 76. Each semantic increment journals its 
effects on the symbol Uble 56 or emitted code of table 
73 (including information on how to validate itself). 
Associated code is joumalled in code increment tables 
73, and its location is represented as a handle in the 
semantic increment. There are many other potential 
candidates for joumals. The choice of implemented 
journals is up to the individual compiler, based on an 
analysis of the cost/benefit of each joumal. 

The limitation on this technology is that the incre- 
ments can only be associated with line-blocks of text 
and the joumalling must be sufficiently simple to make 
the playback (much) more efficient than recomputing 
the effect. There is no expectation that every speedup 
will in fact be cost-effective. 

Joumalling is more effective when the actions record 
activity across a concise and well-defmed interface. 

Each joumal is valid under a set of language-specific 
conditions, checks for which become part of the jour- 
nal. One universal condition is that the corresponding 
source text of module 12 has not changed, information 
which RCASE 21 provides to the scanner 65 of the 
compiler 11 through clcanline increments from tables 
50. 

A joumal can be optimized for a given increment of 
source text. For example, where a traditional compiler 
might repeatedly look up an occurrence of a variable in 
the symbol table 56, the incremental compiler 11 can 
look it up once, insure that the previous attributes have 
not changed, then accept all actions relevant to the 
variable for the entire increment without further check- 
ing; For an "if statement, once the variables have been 
checked for validity, the final target code can be played 
back as a single action rather than as a series of smaller 
emitter and fixup actions. 

The use of joumalling entails the following responsi- 
bilities: 

(a) the producer of the journal must provide an the 
necessary access functions to manipulate its joumal 
entries (create, delete, validate, iterate, etc.). 

(b) each joumal must be allocated on a per-source- 
buffer 12 basis to avoid thrashing. 

(c) a journal should provide check-pointable handles 
to its entries. 

(d) a journal entry should only be interpreted through 
the access functions provided by the joumal pro- 
ducer. 
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Context switching and checkpointing is another fea- 
ture of importance. In a session, the developer will 
compile many different source buffers 12. Since 
RCASE requires than an incremental compiler 11 be 
callable, it will be able to manage the processing of 5 
multiple modules. The incremental compiler 11 must be 
capable of saving information, hiding that information 
while switching context to a new module 12, and then 
uncovering the saved infonnation when the context is 
switched back. This in turn implies that all of the inter- 10 
nal modules of the incremental compiler (e.g. scanner 
65. generator 71, emitter 72) must be able to support 
context switching of their respective saved-states (e.g. 
lexical increment tables 67, semantic increment tables 
76, etc.). 15 

To support context switching, incremental compilers 
11 must support the following capabilities accessible to 
RCASE: 

SetContext(h) : where h is a **co'ntext handle*', allow- 
ing the callable compiler 11 to save state for more 20 
than one compilation unit at a time, and switch 
between contexts on demand. The current context 
affects the memory allocation scheme as well as 
defming the meaning of compiler-supplied services 
such as Checkpoint( ). 25 
For example, consider the table 66 describing token 
values. There may be more than one module 12 being 
processed by the environment. The scanner 65, in addi- 
tion to saving the token values, must be able to reveal 
and conceal a particular token table 65 upon receiving a 30 
request to SetConlext(h) for context h. 

In addition, RCASE provides session-support to 
allow the developer to save the current environment 
and resume it at a later time (checkpoint/restart). This 
requires each compiler module to be able to write its 35 
saved state to a file, and read and restore the same saved 
state. This requirement suggests an implemenution that 
does not use machine addresses within, or to describe 
saved information (e.g. handles). 

To support checkpoint/restart, incremental compil- 40 
ers 11 must support the following capabilities accessible 
to RCASE 21: 

Checkpoint( ): this entry activates the compiler's 
checkpoint facility which will record onto a file all the 
relevant state information for the current source mod- 45 
ulc 12. The return value is the pathname for the check- 
point file (note: SetContext( ) is used to iterate through 
multiple modules 12), 
Restart(n) : where n is the pathname for a checkpoint 
file, activates the compiler's restart facility which 50 
will restore all the data structures required for 
incremental compilation from the contents of the 
checkpoint file. Typically Restart is invoked from 
the operating system command line that activates 
RCASE rather than from within RCASE as indi- 55 
cated here. 

In addition, RCASE will supply to the compilers 11 
the following capabilities for generating unique check- 
point flic names: 
ModuleName( ): returns the name of the current 60 

source module 12. 
ProjectName( ): returns the name of the current 
project. 

The incremental linker 15 receives as its input the 
code-data-symbol buffers 14, which as explained above 65 
include the symbol tables 56, code increment tables 73, 
link tables 58, etc. The linker 15 is incremental in the 
sense that increments from the code-data-symbol buff- 
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ers 14 which have not been changed since the previous 
linking operation, and are not dependent upon changed 
elements, are reused. One of the features which allows 
the incremental linker to operate much faster is that the 
code-data-symbol tables Hare in virtual memory rather 
than being in files; all of the complexities of Hie systems 
and file formatting, as well as the time of saving and 
restoring, are not present. Another feature is that the 
environment is creating non-optimized code, and thus 
the code (as embodied in the code data symbol tables, or 
as in the executable code tables plus link lists) is very 
simple, ipaking the task of linking a less complex one. 

The compiler produces the need and supply data of 
the link tables 58 for each module. These tables have 
"new", "old", "delete" tags 59 for each entry 57, and 
the linker 15 uses these to update only changed, data in 
changed modules, i.e., if a module hasn't been changed 
then it need not be updated, and in a module that has 
been changed it is only those need or supply items that 
have been changed that need to be updated. The foreign 
addresses in run-time libraries 24 will remain constant, 
ordinarily, from one cycle to the next. So, the compiler 
11 produces a table 58 for each module 12, and the 
incremental linker 15, when the link function is invoked, 
generates or updates a global link table which is a com- 
bination of all of the tables 58. This global link table 
contains an entry for each need or defmition (supply) in 
any of the modules, and this table is held in memory 
from one cycle to the next so it need not be formatted as 
a file and written to storage then later recalled, the 
in-memory character and the fact that the bulk of the 
entries need not be regenerated greatly speeds up the 
link part of the turnaround. 

These features illustrate the extraordinary demands 
RCASE makes on memory, and thus the importance 
the memory management methods described above 
have upon operation. Not only is the executable code 
for RCASE 21 itself always present, but an editor 10 
and all the source code under development (as modules 
12), at least one compiler 11, the debugger 22, a linker 
15 and builder are simultaneously active. The collection 
of modules must interact with the virtual memory sys- 
tem in a way that avoids thrashing. The natural phasing 
of activity in the edit-compile-fink-run cycle allows the 
host memory system to manage efficiently the overlay 
of the executable components of a running RCASE 
session. Most components that save state deal with only 
one source module at a time, leading to a requirement to 
allocate memory on a per-sourcc-buffcr basis. This can 
be done with zones, or in a more primitive fashion with 
the well known procedures ma]loc( ) and realloc( ). The 
incremental compiler 11 must be reasonable in the allo- 
cation of memory. 

The foregoing description of the preferred embodi- 
ment of the invention has been presented for the pur- 
poses of illustration and description. It is not intended to 
be exhaustive or to limit the invention to the precise 
form disclosed. Many modifications and variations are 
possible in light of the above teaching. It is therefore 
contemplated that the appended claims will cover any 
such modifications or embodiments as fall within the 
scope of the invention. 

What is claimed is: 

1. A method of developing source code, comprising 
the steps of: 

a) editing selected lines of each of a plurality of mod- 
ules of source text using an editor, said modules of 
source text being stored in memory in a plurality of 
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source-text twffers, each one of the source-text 
buffers CQntaining a plurahty of lines of said source 
text; 

b) compilmg each of said source^text buffers to pro> 
dace in menx)ry a plurality ofcode tables, one code 5 
table for each one of said buffers; 

c) linking said code tables to produce in memory a 
Imk table and storing said code tables and said link 
table as a code image in a buffer in memory; 

d) running the code from said code image in said 10 
buffer, and 

e) holding said source-text buffers^ said code tables, 
and said link table, in memory, for re-using; 

f) rMditing other sdectcd lines of each of said plural- 
ity of modules of source text using said editor, said IS 
re-edited modules of source text being again stored 

in memory in said i^urality of source-text buffers; 

g) re-compiling each of said re-edited source-text 
buffers to again produce in memory a plurality of 
updated code tables; 20 

h) re-linking said updated code tables to again pro- 
duce in memory an updated link table and storing 
said updated code tables and said updated Hnk table 
as an updated code image in a buffer in said mem- 
ory; 25 

0 again running the code from said updated code 
image in said buffer; 
wherein each line of said source-text buffers has an 
associated change*tag to indicate whether or not said 
line of source code is changed in said steps of editing 30 
and re-editing; and said steps of compiling and re-com- 
piling only compiles lines of said source text buflers 
indicated to have been changed by said change-tag of 
each line. 

2. A method according to claim 1 wherein said com- 35 
pOing includes generating code for said code table for 
executable statements and creating symbol tables in 
memory for declarative statements in said source text, 
each entry in the symbol tables corresponding to one of 
said declarative statements. 40 

3. A method of devdoping source code, comprising 
the steps of: 

a) editing selected Bnes of each one of a plurality of 
modules of source text and storing edited source 
text in memory in a plurality of source-text buffers 45 
wherein each module of source text has a pliu^ality 
of Kncs, each line having an associated change-tag 
indicating whether or not the line has been 
changed since a previous step of editing each said 
module; SO 

b) compiling said plurality of source-text buffers to 
produce a plurality of code tables in memory; 

c) linking said code tables to produce a link table and 
storing said code tables and said link table as a code 
image in a buffer in memory; 55 

d) running the code from said code image in said 
buffer in memory; 

e) re-editing other selected lines of each of said plural- 
ity of modules of source text to produce re-edited 
modules of source text, and again storing in mem- 60 
ory said re*edited modules in said plurality of 
source-text buffers; 

0 re-compDing each of said re-edited source-text 
modules to produce in memory a plurality of up- 
dated code tables; said steps of compiling and re- 65 
compiling only compDes lines of said source text 
buffers indicated to have been changed by said 
change-tag of each line 



g) re-linking said updated code tables to again pro- 
duce in memory an updated link table, and storing 
said updated code tables and said updated link table 
as an updated code image in a buffer in said mem- 

h) again running the code tom said updated code 
image in said buffer, 

0 said steps of compiling re-compiling including gen- 
erating said code tables for executable statements 
of said source text and generating in memory sym- 
bol tables for declarative statements of said source 
text, each entry in the symbol table corresponding 
to one of said declarative statements; 

j) said steps of linking and re-linking including gener- 
ating said link table and updated link table in mem- 
ory where each entry in said link table and updated 
link table includes an identification and location of 
an element needed by or supplied by said code 
tables, and including matching said elements 
needed or supplied in said link table and updated 
link table and generating in memory a link list and 
updated link list used in said steps of running and 
again running. 

4. A method according to claim 3 wherein each of 
said steps of compiling, linking and running is halted to 
return an error indication if an error is detected, with- 
out completing the sequence of said steps, but saving 
said source-text buffers, said code tables, said symbol 
table and said link table in virtual memory. 

5. A method according to claim 4 wherein said 
source-text buffers, said code tables, said symbol table, 
and said link table saved in virtual memory are paged in 
and out of real memory upon demand. 

6. A method according to claim 3 wherein said 
source-text buffers, said code tables, said symbol table, 
and said link table are each saved in contiguous pages in 
virtual memory and are paged in and out of real mem- 
ory upon demand. 

7. A method of developing source code, comprising 
the steps of: 

a) editing a plurality of modules of source text, said 
source text for each module being held in memory 
in one of a plurality of separate source-text buffers 
as a plurality of lines; 

b) compiling said source-text buffers to produce in 
memory a plurality of code tables, one for each 
source-text buffer; 

c) linking each of said code tables to produce a link 
table in memory; 

d) running the code from said code tables and link 
table in memory; 

e) said step of compiling including, for each one of 
said source-text buffers, generating said code tables 
for executable statements in said source text and to 
generating symbol tables in memory for declara- 
tive statements in said source text, each entry in the 
symbol table corresponding to one of said declara- 
tive statements; 

0 said step of linking including generating said link 
table in memory where each entry in said link table 
is an identification and location of an clement 
needed by or supplied by said code table. 

8. A method according to claim 7 wherein each one 
of said source-text buffers includes in memory an entry 
for each line of source code and a change-tag for said 
line indicating whether or not said line has been 
changed since the last time the compile step has been 
performed. 



9. A method according to claim 8 including the step 
of detecting semantic increments in said lines of source 
text axKl storing in memory identification of each said 
semantic increment, and for each sexnantic increment 
checking via said change-tag in memory to see if the ' 
line or lines including said statement have been 
changed, and, if not, skipping said semantic increment 
and reusing code from said code taMe in memory. 

10. A method according to clann 9 including the step |q 
of generating in memory for each source-text bolTer a 
descriptor table including for each line of said source 
code a table entry having a locator for said line and an 
indication of the number of clean lines following said 
line. 15 

11. A method according to claim 10 including the 
step of checking said descriptor table in memory to sec 
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if the number of clean lines is greater than the number of 
hncs of said semantic increment. 

12. A method according to claim 7 wherein each of 
said steps of compiling, linking and running is halted to 
return an error indication if an error is detected, with- 
out completing the sequence of said steps. 

13. A method according to claim 7 wherein said 
source-text buffer, said code table, said symbol table and 
link table are aU saved in memory after said steps of 
compiling, Hnking and running. 

14. A method according to claim 13 wherein said 
soorcc>text buffer, said code table, said symbol table and 
said link table for all of said modules are all saved in 
virtual memory after said steps of compiling, linking 
and running, and paged in and out of real memory upon 
demand. 

• • * • • 
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